1 Task and executive summary

In this report a 2D resource evaluation of a Co-Ni deposit is conducted. In this process, also different interpolation and cosimulation methods are compared with regards to error. The assigned files contain several layers of data from different parts of the same nickel laterite deposit (Murrin Murrin, Western Australia). Four lithologies are reported:

  • FZ = (ferruginous zone),
  • SA = (saprolitic zone),
  • SM = (smectitic zone), and
  • UM = (“fresh” ultramafic rocks)

The metallogenetic model considers Ni (and to a lesser degree Co) is washed from the FZ and concentrated in the SA and SM zones, while Mg is utterly removed from the system. So, there are systematic changes in the composition of the ore depending on the zone.

This work consists on analysing this data set, with an exploratory data analysis, a spatial exploratory analysis, variography, indicator and cokriging as well as cosimulation, and crossvalidation to evaluate the resource distribution in one slice of the deposit. To optain an overview of the effect of choosing different interpolation methods. In total six methods are compared. Three Kriging methods and three gaussian simulation methods. The methods are as follows:

Kriging Simulation
VK1 Cokriging w. log-transformed values VS1 Cosimulation w. log-transformed values
VK2 Mixed-model kogriging w.
log-transformed values and lithology
VS2 Mixed model cosimulation w.
log-transformed values and lithology
VK3 Logratio-composition based cokriging VS3 Logratio-composition based cosimulation
IK Indicator kriging for Lithology with anisotropic model

As such, the experimental plan follows somewhat the scheme of comparing

  1. the methods gaussian simulation to “classic” kriging methods and
  2. obtaining information about different levels of model complexity and their impact on the result

While this testing plan does not follow a Design of Experiments appoach directly. It could somewhat viewed as a two dimensional testing plan with two levels for a and three for b.

The variants then are compared to each other and a final model is chosen and applied vor interpretation of the results. The selection of the final model is based on the mean absolute error and crossvalidation scatterplots.

For the final model tonnage-grade curves are computed to estimate the resource potential of the area. Since the final model uses simulation methods. A manyfold of grade-tonnage curves is produced to estimate the variation and uncertainty of the resource estimation.

The dataset is a partial of a study published as: [1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9

Notice: The nature of this work is largely driven by graphical assesments and graphing To maintain a certain brevity, some subsections of this work are structured in parralel. This means, that the user can activate the subsection she or he wishes to see. This also allows to show details not neccesary for the core understanding of the work. Code sections used to compile the calculations can be expanded by clicking on Code.

2 Preparations

For the assembly of this work, the following packages are used:

#install.packages("funModeling")
#install.packages(gridExtra) 
#install.packages(funModeling)
#install.packages("corrplot")
#install.packages("kableExtra")
#install.packages("vcd")
#install.packages("cvms") # for confusion metrics and matrices
#install.packages("plotly") # for interactive plots, is not loaded into namespace since it overwrites some dplyr functions
#install.packages("ggnewscale") 
#install.packages("rasterly")

library("rasterly")     # fast plotting of larger spatial datasets with plotly
library("ggnewscale")   # allows combination of cont. and factorial data in one ggmap
library("sp")           # contains spatial data classses
library("gstat")        # classical geostatistics 
library("RColorBrewer") # color palettes
library("magrittr")     # pipe %>% 
library("dplyr")        # data manipulation verbs
library("gmGeostats")   # gstat re-interfaced / for swath plots
library("ggplot2")      # nicer plots
library("data.table")   # for fast data frame manipulation
library("knitr")        # for nicer report formatting
library("kableExtra")   # for table formatting
library("corrplot")     # for corralation visualisation
library("funModeling")  # helper functions for exploratory data analysis
library("compositions") # compositional modelling
library("gridExtra")    # for arranging multiple ggplots in a grid similar to par(mfrow)
library("vcd")          # slightly improved optics for mosaic plot
library("readr")          # more comfortable read csv fun

The provided dataset consists of 12 columns and 735 observations. By standard, the columns X1-Ni are imported as numerical. “Hard”, “Lcode” and “subset” can be considered factorial variables and therefore are directly converted accordinlgly for further use.


#import

dt <- read_csv("Grafe.csv") %>% as.data.table()

#convert  colums to factor
fkt <- c("Hard", "Lcode", "subset" )
dt[,(fkt) := lapply(.SD, factor), .SDcols = fkt]

#print data table
dt %>% head() %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
X1 X Y Al Co Fe Filler Mg Ni Hard Lcode subset
23 1325 225 4.70 0.001 41.50 53.685 0.09 0.024 3 FZ training
32 1325 325 1.43 0.011 11.93 78.699 7.70 0.230 0 FZ training
52 1300 250 3.60 0.004 19.70 76.259 0.39 0.047 3 FZ training
76 1275 175 5.10 0.031 30.90 62.909 0.64 0.420 3 FZ training
100 1275 225 1.60 0.012 3.50 81.479 13.10 0.309 2 FZ training
123 1275 275 2.80 0.029 17.20 71.044 8.24 0.687 1 FZ training

3 Exploratory Data Analysis

3.1 Descriptive Statistics Continuous Variables

Generally, three groups of data are present in the dataset:

  • X1, the ID-value of the sampling point
  • The spatial data: X and Y
  • The quantitative lab data for the elements Al, Co, Fe, Mg, Ni and a Filler-variable.The Filler variable is “introduced to achieve closure and to retain the intuitive relationship between each component and the mass of its associated element” [1].
  • The categorical variable Hard, Lcode and Subset. Hard relates to the rock hardness and LCode to the above mentioned Lithologies. Subset devides the dataset into a part that is used for the actual modelling and a validation part. The main part of this work is done with the “training” part of the dataset.

3.1.1 Summary Statistics

The the summary statistics for the numeric variables are shown in the following table.

options(knitr.kable.NA = '', digits=2, scipen=999)
 
 sumstats <- profiling_num(dt) %>% #convenient function to calculate most standard summary stats
   select(mean, std_dev, variation_coef, p_25, p_50, p_75, skewness, kurtosis) %>% #select wanted stats
  cbind (.,data.frame(min = sapply(dt[,.SD, .SDcols = is.numeric], function(x) min(x, #add min / max
         na.rm = T)), 
         max = sapply(dt[,.SD, .SDcols = is.numeric], function(x) max(x, 
         na.rm = T)) )) 
 
 names(sumstats) <- c("Mean", "Std. Dev.","Var. Coef.", "Q1", "Q2/Median", "Q3", "Skewness", "Kurtosis", "min", "max")
 #improve look and print
  sumstats %>%
   kable( table.attr = "style = \"color: black;\"")  %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) 
Mean Std. Dev. Var. Coef. Q1 Q2/Median Q3 Skewness Kurtosis min max
X1 9292.05 5313.11 0.57 4303.50 8693.00 14329.50 -0.31 1.4 23.00 14513.0
X 589.97 346.76 0.59 300.00 550.00 850.00 0.32 2.0 0.00 1325.0
Y 355.07 174.26 0.49 225.00 325.00 475.00 0.54 2.5 25.00 850.0
Al 4.84 3.57 0.74 1.50 4.20 7.60 0.55 2.4 0.10 17.5
Co 0.07 0.12 1.85 0.02 0.03 0.06 5.35 43.4 0.00 1.4
Fe 23.21 13.04 0.56 10.60 22.30 34.80 0.20 1.7 1.00 54.9
Filler 65.78 10.63 0.16 56.34 67.64 74.01 -0.10 2.1 40.33 93.8
Mg 5.40 6.31 1.17 0.24 1.41 11.14 0.85 2.3 0.06 24.1
Ni 0.70 0.51 0.72 0.33 0.58 0.97 1.24 4.7 0.01 3.0

It stands out, that Co exhibits both a very high skewness as well as kurtosis. This points towards a highly skewed distribution with possible outlieres. Fe and Filler exhibit a relatively low skewness. Generally, the distributions of the variables will be investigated further. X1, the drillhole ID will be dropped from now on.

Regarding the units of the variables in the dataset, it can be assumed that the values of the chemical elements are weight-percent and the spatial coordinates can assumed to be meters. The exact coordinate system is not known, but it appears to be a local kartesian coordinate system. It is not known, wheather X denotes Northing or Easting. For the sake of simplification, it is chosen to be the same as the traditional X-axis in plots.

Furthermore, the chemical variables and the filler sum up to 100 %. This allows the chemical components to be treated as a set of compositions.

3.1.2 Density Plots

While histograms are a common and familiar technique to evaluate distributions of data, they are harder to read when data of multiple groups of data expressed in one plot. Because of this, density plots are chosen for visualisation in this work. This allows for a grouping of the data according to their lithology.


#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
                        id.vars = c("Lcode"))
#draw boxplots 

a <- ggplot(data = dt.m, aes( x=value, fill=Lcode)) +
  geom_density( alpha = 0.3, aes(y = ..density..))+
  facet_grid(scales = "free", rows = vars(variable))
  
a

It shows, that for all variables except Fe and Filler, logaritmic scales seem to be preferentiable. This goes in line with the findings from the summary statistics. Hence, in the following, the logarithmized results are shown:


#prepare dt
colnames.num <- names(dt[,.SD, .SDcols = (is.numeric )][,-c("X1", "X", "Y")])
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")][,-c("X1", "X", "Y")],
                        id.vars = c("Lcode"))
#draw plots 

a <- ggplot(data = dt.m, aes( x=log(value), fill=Lcode)) +
  geom_density( alpha = 0.3, aes(y = ..density..))+
  facet_grid(scales = "free", shrink=F, rows = vars(variable))
  
a

Both from the general histograms as well as from the log-histograms, it can be seen that the distributions show differences inbetween the different lithologies. For almost all elements, differences in the empirical distribution density funtions can be observed. both the spread of the distribution as well as the measures of centrality differ. The observed differences discussed further in the subsequent sections.

Also it can be seen, that a logarithmic scale seems to provide distributions that less are skewed and more close to a normal distribution which is a prerquisite for the kriging analysis.

3.1.3 QQ-Plots

To confirm this, the QQ-plots are drawn on the normal data:

{par(mfcol=c(2,3))
qqnorm((dt$Al), col=2);qqline((dt$Al));legend(legend = "Al","topleft")
qqnorm((dt$Co), col=2);qqline((dt$Co));legend(legend = "Co","topleft")
qqnorm((dt$Fe), col=2);qqline((dt$Fe));legend(legend = "Fe","topleft")
qqnorm((dt$Filler), col=2);qqline((dt$Filler));legend(legend = "Filler","topleft")
qqnorm((dt$Mg), col=2);qqline((dt$Mg));legend(legend = "Mg","topleft")
qqnorm((dt$Ni), col=2);qqline((dt$Ni));legend(legend = "Ni","topleft")}

Large deviations can be seen for most distributions. Especially the tail behaviour differs from the normal distributions. So the data again are log-transformed:

dt.orig <- dt #backup for debugging
fkt <- c("Al", "Co","Fe", "Filler", "Mg", "Ni") #colums to transform
fkt_new <- c("Al_log", "Co_log","Fe_log", "Filler_log", "Mg_log", "Ni_log") #names of the transformed colums
dt <- dt[,(fkt_new) := lapply(.SD, log), .SDcols = fkt] #actual transformation with data.table

Now the QQ-plots for the log-transformed data are as follows:

{par(mfcol=c(2,3))
qqnorm((dt$Al_log), col=2);qqline((dt$Al_log));legend(legend = "Al_log","topleft")
qqnorm((dt$Co_log), col=2);qqline((dt$Co_log));legend(legend = "Co_log","topleft")
qqnorm((dt$Fe_log), col=2);qqline((dt$Fe_log));legend(legend = "Fe_log","topleft")
qqnorm((dt$Filler_log), col=2);qqline((dt$Filler_log));legend(legend = "Filler_log","topleft")
qqnorm((dt$Mg_log), col=2);qqline((dt$Mg_log));legend(legend = "Mg_log","topleft")
qqnorm((dt$Ni_log), col=2);qqline((dt$Ni_log));legend(legend = "Ni_log","topleft")}

It can be seen that for:

  • Al the logarithmization changes the tail bevaviour from a tail an the lower side to the higher side
  • For Co, a significant improvement towards a noemal distribution can be observed
  • For Fe, the symmetrical tail behaviour is changed to a one sided tail behaviour
  • For Filler, little difference can be observed
  • For Mg an improvement is observed, although it still shows a significant two-sided tail
  • For Ni, a small improvement is observed in the central part of the observation

As a result, all values, even Fe and Filler will be used as a logarithmized version. By this, all variables can be compared witch each other.

3.2 Categorial Variables

The summary for the cathegorial variables is shown below. Of relevance to the understanding of this dataset are the two variables Hard and Lcode. Additionally, a variable subset exists. It stores the information wich part of the data set is used for training and wich is used for validation of a modelling process. It is almost distributed 50-50 across the dataset. The table below shows the number of members in the respective groups.


dt[,.SD, .SDcols = is.factor] %>% summary %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Hard Lcode subset
0: 53 FZ:334 training :371
1:394 SA: 65 validation:364
2:166 SM:278
3:122 UM: 58

The interrelationship of Hardness and Lithology is plotted with a mosaic plot. Hereby all possible combinations of levels for Lcode and Hard are plotted against each other. The size of the cells in each of the two axis’ corresponds to the number of members for that level. Additionally, the pearson residuals are plotted as blue and red shades. A positive pearson residual corresponds to more members being in this level combination than the null hypothesis would suggest. The null hypothesis is that no particular association exists between the groups. More information can be found in the supplement to the lecture [3], or in [4]. Additional to the mocaic plot the numeric values of the group combinations are shown in the contingency table.

Mosaic plot



#draw mosaic plot
vcd::mosaic(~Hard+Lcode, data = dt, shade=TRUE, legend=TRUE,direction="v",)

Contingency Table

#calculate contingency table
ct <- dt[,.SD, .SDcols = c("Hard", "Lcode")] %>% 
  table() %>% as.data.frame.array() 

#add colsums
csum <- ct %>% apply(.,2,sum)
ct["Sum",] <- csum
#add rowsums
rsum <- ct %>% apply(.,1,sum)
ct$Sum <- c(rsum)

#print
ct %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
FZ SA SM UM Sum
0 19 10 21 3 53
1 169 43 181 1 394
2 81 6 63 16 166
3 65 6 13 38 122
Sum 334 65 278 58 735

It shows that Rock Hardness 3 is positively associated to UM (fresh ultramafic Zone) and negatively to SM (smectitic zone). Other than that, lower associations seem to exist beween SA (saprolitic zone) and Rock 0 as well as between SM and Rock 1. Especially the association of hardness level three is logical because fresh unweathered rock is generelly harder/more robust than its weathered counterparts. In this dataset, the hardness level could also identify or incorporate indicators for the Rock Mass Quality as details of the nature of the variable Hard are not known.

3.3 Association of Elements to Rock Classes

To identify the association of the elements to the lithologies, boxplots after Tukey are produced:

#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Lcode")],
                        id.vars = c("Lcode"))

ggplot(data = dt.m, aes( y=log(value),group=Lcode, fill=Lcode)) +
  geom_boxplot( alpha = 0.3, notch = T)+
  facet_wrap(scales = "free_y", shrink=F, ~variable, ncol=3)+
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

It clearly can be seen that differences of the element components between the rock masses appear. For Al and Fe, the highest values are found in Zone FZ, followed by SA, SM and UM in descreasing order. Mg and Filler however behave oppositely, with the lowest values found in FZ and SA, SM and UM showing values in increasing oder. Co and Ni show a different behaviour. Here the highest values are found in SA, followed by SM. The association of SA to Ni and Co is already known from the geological description of the site and can be confirmed here. FZ and UM show similar low values of the two elements. Hereby the notches give a graphical indication whether the differences in the medians are significant. When the notches do not overlap, there is strong evidence, that the medians differ. According to that, the differences between:

  • SM and UM for Al,
  • SA and SM for Co,
  • SM and UM for Filler,
  • SM and UM for Mg and
  • SA and SM for Ni

do not differ significantly with a confidence intervall of 95 %.

The results however show, that generelly a relative strong influence of the lithology towards the mineralization can be assumed.

3.3.1 Association of Elements to Rock Hardness

To identify whether similar findings can also be seen for the rock hardness, the procedure is repeated for the rock hardness.

Elements vs Hardness

#for grouped stats with ggplot, melted data tables are an convenient option
dt.m <- melt.data.table(dt[,.SD, .SDcols = c(colnames.num, "Hard")], 
                        id.vars = c("Hard"))

ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
  geom_boxplot( alpha = 0.3, notch = T)+
  facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)

Detail: Elements vs. Hardness for “SA”


dt.m <- melt.data.table(dt[Lcode=="SA",.SD, .SDcols = c(colnames.num, "Hard")], # Example for detailed view for Lcode="SA"
                        id.vars = c("Hard"))

ggplot(data = dt.m, aes( y=log(value), fill=Hard)) +
  geom_boxplot( alpha = 0.3, notch = F)+
  facet_wrap(scales = "free", shrink=F, ~variable, ncol=3)+
  labs(title = "Composition parts vs. Hardness for Lcode SA")

While some differences can be seen, the general picture is less destinct than for the rock type. UM is tendentially the hardest zone and also contains tendentially more Mg. This does not show in the hardness.

For Co, the picture on first sight is very similar to the results from the boxplots grouped by Lcode. Zone SA (wich is relatively higher associated to hardness 0), showed the highest Co content. Here however, hardness 1 shows the highest Co contents. So a simple relationship can not be assumed. Also crossinterelations between Lithology and hardness could be assumed.

For example within Zone SA, a even stronger accumulation of Co could be associated to a certain rock hardness. This is exemplarily shown in the second boxplot. There the notches are removed because the results are partially so skewed that the notches would extend over the IQR. Generally a similar picture for the elements Co, Mg and Ni can be observed: That a stronger accumulation of these elements occurs in hardness 1. The composition of a rock mass can influence the hardness even in a small scale. Some inicators for this during mechanical excavation have been found by Rimos et. al [3] during a Master Thesis at TU BAF. There, the cutting resistance for the cutting with conical picks of pyritic Pb-Zn-Ore samples from Reiche Zeche was tested. It showed a higher cutting resistance in areas in which increased Ca-contents where found. The element contents where aproximated in a dense pattern by means of a handheld XRFA device.

However, due to the necessary complexity in modelbuilding, this influence is dropped as it would overextend the framework of this work.

# this is completed if necessary

# options(scipen = 5)
# i <- 1
# for (i in 1:length(levels(dt$Lcode))) {}
#   contrasts(dt$Lcode) <- contr.treatment(levels(dt$Lcode), base=i)
# u <- levels(dt$Lcode)[[i]]
# lm1 <- lm(dt,formula = Mg~Lcode) 
# lm1 <- lm1%>%summary
# 
# names(lm1[["coefficients"]][,1])
# lm1[["coefficients"]][,1]
# lm1[["coefficients"]][,4]

3.4 Correlation and Covariance

The correlation and covariance statistics for the numeric values are presented as follows. The correlation is calculated twofold: as spearman correlation and as pearson correlation. The pearson correlation is the goodness of a fit of a linear regression between these two variable. A pearson coeficient of 1 equals a R^2 of 1 for a linear regression. A spearman correlation coefitienf follows the same basic equation but takes into consideration the ranks of the data pairs insead of their actual values. As such, a spearman correlation of 1 means that the realation between the pair is perfect monotonous ascending function. As such, a high spearman and a low pearson coeficient indicates e.g. a quadratic relation. A more detailed description of the two methods can be found e.g. in [2].

Values in the upper triangle show the respectrive spearman-correlation coefficients, values in the lower part show the pearson correlation coefficients. Correlations that do not satisfy a treshhold of sigma < 0.05 are marked with a black cross. These correlations can be considered insignificant.

All calculations are done with the logarithmized values as these will be taken for the further interpolation - exept the spatial variables X and Y. For comparison, also the covariance matrix is computed.

Correlation

use <- cbind(dt[,2:3], dt[,4:9] %>% log())

dt.ptest.sp <- cor.mtest(use,method="spearman")
dt.cor.sp <- cor(use,method = "spearman")
dt.ptest.p <- cor.mtest(use,method="pearson")
dt.cor.p <- cor(use,method = "pearson")


corrplot(dt.cor.sp,p.mat = dt.ptest.sp$p,
         method = "color", outline=T,
         type="upper",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "td",
         addCoef.col="black",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,2,1)) 

  corrplot(dt.cor.p,p.mat = dt.ptest.p$p,
         method = "color", outline=T,
         type="lower",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "lt",
         addCoef.col="black",
         cl.pos="n",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,1,1), add=T)
  
mtext(side = 3,line = 2, "Top: Spearman")
mtext(side = 2, "Bottom: Pearson")

Generally, it can be seen, that the correlations after pearson and spearman show similar results. Spearman correlation in some cases gives higher values than pearson. This is due to its definition. However, also the opposite can be observed here. When the pearson correlation is higher than spearman, it speaks for deviations from linearity that are so little, that they reduce the spearman correlation more than pearson (since spearman operates rank-based). When the spearman correlation is higher, a monotonuous but not so linear function is present.

The variable X and Y show very low correlations to the geochemical variables, some of them are also insignificant. This speaks for the absence of a general spatial trend. This will be investigated later in more detail.

Some correlation pairs show a higher correlation after spearman. This applies mainly to Mg/Ni and Al/Ni, Filler/Fe. Others like Mg/Filler and Fe/Al show a higher correlation for Pearson.

Generally Within the group of chemical variables, the pairs show the following noteworthy correlations:

  • Al/Fe and Ni/Co, Mg/Filler show relatively strong positive correlation
  • Al/Filler, Al/Mg and Al/Ni, Fe/Filler, Fe/Mg show a negative corralation

Noteworthy is that Al correlates positively only with Fe and negatively with filler speaks for the fact that it occurs as a companion with the iron and not as a own silicate mineralization. This goes in line with the geological description after [1] wich states a lateritic zone. Co and Ni show a positive relation, at the same time Ni shows a negative relation to Al. This speacks for the fact that these two form their own mineralization complex.

Covariance log values

To allow for a comparison of the variance values, the covariance is calculated for the log-transformed values. Otherwise, the scales would be on different scales. Through the quadratic nature of the variance, this would lead to uncomparable values inbetween the groups.


use <- dt[,4:9] %>% log()

dt.cov <- cov(use,method = "pearson")



corrplot(dt.cov,
         method = "color", outline=T,
         type="upper",
         addgrid.col = "grey", tl.col = "black",
          tl.pos= "td",
         addCoef.col="black",
         sig.level = 0.05, 
         order="original", hclust.method = "complete", addrect = 6, rect.col = "black",
         mar =c(1,0,2,1),is.corr = F,
         number.digits=2, number.cex = 0.8,
         cl.align.text = "l") 

The table shows similar results to the correlation analyis. However, it also shows the magnitude of variation, since it is a non-standardized version of the correlation. As such, Fe and Filler generally vary not so much, although they show the highest values. The highest absolute covariation values show for the pairs Al-Mg and Fe-Mg. Since the values are negative, the correlation is also negative. Also noteworthy is that the Pair Fe/Filler - wich shows a strong correlation - but shows a weak covariance. This can be interpreted to the low standard deviation especially for Filler.

3.5 Scatter Plot Matrix and Swath Plots

A Scatter plot matrix can shed more light on the cross relations between the covariates. It is a tool that can give a broad understanding of the situation and allows a more detailed picture than a sheer correlation analysis. Here, the data are grouped according to the rock type to identify whether the correlation behaviour is different inbetween the rock types. In the upper part, linear regression lines are drawn. On the diagonal, the density plots already shown previously are drawn. In the lower section raw scatterplots are drawn.

While the scatter plot matrix unveils the behaviour of the data accoring to their LCode, also the global Swath-plots are drawn.

Scatter plot matrix

library(GGally)


bigplot <- ggpairs(data = dt,columns = c(2,3,13:18),
                   mapping = ggplot2::aes(colour=dt$Lcode), 
                   lower = list(continuous = wrap("points", alpha = 0.8, size=0.3)), 
                   diag = list(discrete="barDiag", 
                               continuous = wrap("densityDiag", alpha=0.5 )), 
                   upper = list(continuous = wrap("smooth_loess", alpha = 0.1, size=0.05)
                   ) 
) +
  theme(panel.grid.major = element_blank())
(bigplot)

The following points can be summarized from the Scatter Plot Matrix:

  • The clear correlation between Fe and Filler is repeated

  • The scatter splots indicate clusered ctructures according to the rock type. There the behaviour can be ralatively different for the different rock types. Comparing the regression lines to the correlation coefficients and the scatter plots, it shows that for most pars, the regression coefficients are relatively weak and the data appear more in clusters then in trends.

Swath Plot X-direction

X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$X)

For Mg_log a outlying area varying area can be seen for low X-values. ALso slight trends could be observed for Fe, Filler and Ni to an extend.

Swath Plot Y-direction


X <- dt[,c("X", "Y")]
swath(data = dt[,..fkt_new], loc=X$Y,xlab = "Y",  )

In Y-direction strong local trends appear at low Y-Values.

These findings somewhat speack for the incorporation of directional trends, especially in Y-direction.

3.6 Spatial Descriptive Analysis

Following, the input variables with repsect to their spatial distribution are described.

3.6.1 Cathegorial Variables

The following maps plot the Hard and Lcode variables as dotplots. The Y axis represents the coordinte Y and X represents X. Only the training data are plotted.


dt.m <- melt.data.table(dt[subset=="training",.SD, .SDcols = c("Lcode", "Hard",  "X", "Y")][,-c("X1" )],
                        id.vars = c("X", "Y"))



temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
  geom_point(shape=19)+
  coord_fixed()+
  ggtitle(label = i) #+
  #scale_color_gradient(low = "blue2",high = "red2")

}

do.call("grid.arrange", c(temp, ncol=2))


# par(mfrow=c(1,2))
# 
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Hard), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Hard), fill=2:6)
# 
# plot(Y~X, data=dt, pch=15, col=1+as.integer(Lcode), asp=1, cex=0.5)
# legend("topleft", legend=levels(dt$Lcode), fill=2:6)

It can be seen, that Hard this variable is relatively even distributed. However, Hard 2 and 3 seem to accumulate towards the east of the field. For Lcode, an accumulation of Fz can be seen in a central E-W-band. SM is accumulated more on the west respective N and S borders of the project. Of importance are also the variables UM and and SA. They are relatively sparsely dirstibutet. UM can be found in thin bands along the ESE-WNW axis. SA can be found in single patches in areas where FZ and SM seem to show contact zones. Since it already was shown, that the element contents are affected by the Lcode, it speaks for the fact, that trends need to be investigated.

3.6.2 Continuous Variables

To obtain an understanding of the distribution of the different elements. The values are plotted as follows:



dt.m <- melt.data.table(dt[,.SD, .SDcols = c(fkt_new,  "X", "Y")][,-c("X1" )],
                        id.vars = c("X", "Y"))



dt.m
##          X   Y variable value
##    1: 1325 225   Al_log  1.55
##    2: 1325 325   Al_log  0.36
##    3: 1300 250   Al_log  1.28
##    4: 1275 175   Al_log  1.63
##    5: 1275 225   Al_log  0.47
##   ---                        
## 4406:   50 675   Ni_log -2.73
## 4407:   50 825   Ni_log -0.93
## 4408:   25 150   Ni_log -0.39
## 4409:   25 700   Ni_log -1.71
## 4410:    0 125   Ni_log -0.55
temp <- list()
for (i in unique(dt.m$variable)) {
temp[[i]] <- ggplot(dt.m[variable==(i)], aes(x = X, y = Y, color = value)) +
  geom_point()+
  coord_fixed()+
  ggtitle(label = i)+
  scale_color_gradientn(colors = c("blue2","purple", "red2"))
  
}

do.call("grid.arrange", c(temp, ncol=3))

It can be seen from the maps that areas that show high concentrations of AL tend to show low concentrations of Co, Mg and Ni to some extend. Also the Filler variable exhibits this behaviour. Generally, it seems that two different groups of mineralization exist. One group with high Al and Fe concentrations and one zone with high Mg, Ni, Co, Filler concentrations. This also goes in line with earlier discoveries, where the Rtype corresponds with certain element levels.

3.7 Bin Width and Maximum Distance

For the subsequent geostatistical modelling procedures, the bin width and max variogram distance is estimated as follows. For this estimation, only the assay points for the “training” part of the data set are used.

The minimum distance between sample sites is ca. 35 m. This value will be used as bin size. The maximum can be estimated with the median distance between assay sites or graphical with the radius of the biggest circle that fits in the boundaries of the survey area. The median distance is 431.57 m. The radius of the biggest circle fitting in the study area is estimated with ca. 230 m. As a tradeoff beween the two values, a cutoff of 300 m is chosen since the area has a relatively elongated shape.

For reference, the subsequent plot shows the training and validation data locations.



a <- ggplot(dt, aes(x = X, y = Y, color = subset)) +
  geom_point()+
  coord_fixed()+
  ggtitle(label = "Interactive Plot Training/Validation Data Set")

plotly::ggplotly(a)

4 Indicator Kriging with Anisotropy IK

For the interpolation of the catergorial variable, ordinary indicator kriging is used. No Cokriging is used in this case. The variables are modelled individually. A version of indicator cokriging was included in the mixed-model Cokriging (VK2). Here, the squential fitting of uncorelated anisotropic variograms was used. As such, no interaction between the different rock zones was assumed. This allowes a more exact fitting of single semivariogram models. Also it allows somewhat a comparison between the interpretation results of the Mixed Model and the sequential Indicator Kriging. Shown below is one of the fitted variograms.

#prepare grid for all further use
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)

# create the gstat containers
gs_i_FZ <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)

gs_i_SA <- gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 

gs_i_SM <- gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 
  
gs_i_UM <- gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300) 

Lcodes <- c("FZ", "SA", "SM", "UM")
#initiate the variograms as loop for each LCode
for (i in c("FZ", "SA", "SM", "UM")){
  assign(paste0("vg_i_", i), 
         variogram(get(paste0("gs_i_", i)),
                   width=35, cutoff=300, alpha=c(0:5)*30 ))
}

#fit FZ
#plot(vg_i_FZ)
vgt_FZ  <- vgm(psill=0.1, model="Wav", range=120, nugget=0,anis = c(40,0.4), 
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Gau"))
  
 # plot(vg_i_FZ, model=vgt_FZ)

#  fit SA
#plot(vg_i_SA)
vgt_SA  <- vgm(psill=0.05, model="Wav", range=50, nugget=0.03,anis = c(0,0.4))
  
  #plot(vg_i_SA, model=vgt_SA)
  
#  fit SM
#  plot(vg_i_SM)
vgt_SM  <- vgm(psill=0.05, model="Wav", range=80, nugget=0.03,anis = c(0,0.4),
               add.to = vgm(psill=0.15, range=350,  nugget=0.1, model="Exp"))
  
  #plot(vg_i_SM, model=vgt_SM)

  #fit UM
#  plot(vg_i_UM)
vgt_UM  <- vgm(psill=0.05, model="Exp", range=150, nugget=0.02,anis = c(120,0.3),
               add.to = vgm(psill=0.01, range=50,  nugget=0, model="Wav",anis = c(30,0.3)))
  
  plot(vg_i_UM, model=vgt_UM)

The figure shows the fitted variogram model for UM. To fit it, an overlay of an exponential and a wave function were used at varying anisotropy azimut and ratios. The Exponential part has an azimut of 120° and the wave part an anisotropy of 30°. The figure also shows that a highly accurate fit could not be achieved. However the main characteristics of the variography structure could be captured. In summary the following variogram models were used for the fitting:

  • FZ: Nested Wave (Anisotropic) and Gaussian (omnidirectional)
  • SA: Wave (Anisotropic)
  • SM: Nested Wave (Anisotropic) and Exponential (omnidirectional)
  • UM: Nested Wave (Anisotropic) and Exponential (Anisotropic)

With the fitted variograms, the actual indicator kriging based modelling was conducted:

  #update gstats
  
  for (i in Lcodes){
  assign(paste0("gs_i_", i),
         gstat(id=i,
               formula = (Lcode==paste0(i))~1,
               locations = ~X+Y,
               data=dt[subset=="training"],
               model = get(paste0("vgt_",i)),
               nmax=30)
  )
}

  #kriging
  
  for (i in Lcodes){
  assign(paste0("krig_", i),
         predict(get(paste0("gs_i_", i)),
                     newdata=xy_grid)
  )
}
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
## [using ordinary kriging]
krig_i_s <- cbind(krig_FZ, krig_SA[,3:4], krig_SM[,3:4], krig_UM[3:4])

krig_i_s$Lcode <- krig_i_s %>% select(contains(".pred")) %>% max.col %>% 
factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

krig_i_s %<>% as.data.table()

# applying variance cutoff of Q0.90
var_cutoff <- krig_i_s$FZ.var %>% quantile(0.90, na.rm = T)
krig_i_s <- krig_i_s[FZ.var < var_cutoff]


#Plotting the results

a <- ggplot(data = krig_i_s, aes(x=X, y=Y, fill=krig_i_s$Lcode))+
  geom_raster()+
  coord_fixed()+
  geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y,  fill=Lcode), size=3, shape=21)+
  labs(fill="Lcode", title = "Lcode")

a

The image shows the interpolated results as well as the original sampling points. It can be seen that for the SA areas, small “islands” appear. For UM, wich is also relatively scarce, also smaller patchy areas are modelled. At some sampling points, the modelled patches are so small that they visually do not extend the overlaid sampling point. The quality of the interpolation is quantified with a confusion matrix as follows.


  #kriging for validation data
  
  for (i in Lcodes){
  assign(paste0("krig_val_", i),
         predict(get(paste0("gs_i_", i)),
                     newdata=dt[subset=="validation",])
  )
}
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]

krig_val_i_s <- cbind(krig_val_FZ, krig_val_SA[,3:4], krig_val_SM[,3:4], krig_val_UM[3:4])

krig_val_i_s$Lcode <- krig_val_i_s %>% select(contains(".pred")) %>% max.col %>% 
factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

krig_val_i_s %<>% as.data.table()


CM_i <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = krig_val_i_s$Lcode )

cvms::plot_confusion_matrix(CM_i$`Confusion Matrix`[[1]],
                      add_sums = TRUE,
                      add_normalized = F,
                      diag_percentages_only = T,add_row_percentages = F, sums_settings = )

It can be seen that the accuracy is relatively mediocre. The confusion matrix shows the number of counts in the respective fields, the sum of all predictions in the classes as well as the sum of the true classes (“Target”). The percentage values show the sensitivity for the respective class. Sensitivity is defined as the ratio of the number of true positive predictions devided by the sum of true positives and false negatives. Additionally the overall accuracy is calculated (not shown in the plot). It calculates as the ratio of correct predictions devided by the sum of cases For this model [5]. The overall accuracy is 58.8%.

In the context of accuracy, it must also mentioned, that the variable UM and SA appear relativly sparsely and their structures are small/thin. As such a certain error can be expected when comparing to the validation set.

5 Cokriging with log-values (VK1)

In this capter, omnidirectional cokriging with the log-transformed values is conducted. Since this model is supposed to serve as a “simplistic” model. Neither anisotropy nor directional trends will be incorporated.

5.1 Variography

As a first approach to the modelling of the dataset, the onmnidirectional variograms are fitted as follows.


#variables to use
vars_l = c("Al_log", "Co_log", "Fe_log", "Filler_log", "Mg_log", "Ni_log")

#loop for allocating values to gstat object - only training data
#--> slight change to script, instead of prepopolating gs_l, it is set as empty var, by this the specifics of the variable reading have to be typed only once

{gs_l=NULL
for(j in vars_l){
  frm = as.formula(paste(j,1, sep="~"))
  print(frm)
  gs_l = gstat(gs_l, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 300)
}
}
## Al_log ~ 1
## Co_log ~ 1
## Fe_log ~ 1
## Filler_log ~ 1
## Mg_log ~ 1
## Ni_log ~ 1

vge_l = gstat::variogram(gs_l,width=35, cutoff= 300) # create variograms

#create initial varmodels
vgt_l = vgm(psill=0.5, range=120, nugget=1,  model="Sph", add.to=vgm(psill=0.8, range=300,   model="Gau"))

#fitting
gs_l = gstat::fit.lmc(v=vge_l, g = gs_l, model = vgt_l, correct.diagonal = 1.00001)

#plot varplots + fits
plot(vge_l, model = gs_l$model)

Most variarogram fits are considered sufficiently fitting. Some element combinations show problematic fitting. However their weight to the final result is considered to be very small. As such, the fits are considered sufficient. As such, the actual cokriging commences.

Predictions that have a variance of greater than the .95-quantile of the total variance of Fe are filtered out display and further purposes.

#making grid with a margin of 100 m around extends of survey area
options(scipen = 5, digits = 6)
extends <-dt[,c("X", "Y")] %>% sapply(range) %>% as.data.frame()
x = seq(from=min(extends$X)-100, to=max(extends$X)+100, by=10)
y = seq(from=min(extends$Y)-100, to=max(extends$Y)+100, by=10)
xy_grid = expand.grid(X=x, Y=y)

#prediction
cokriged = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  5% done
 22% done
 30% done
 38% done
 47% done
 58% done
 71% done
100% done
#cokriged %>% select(contains(".var")) %>% summary # for debugging

#sets the variance cutoff above wich values are omitted
var_cutoff <- cokriged$Fe_log.var %>% quantile(0.95, na.rm = T)

#omitting the respective values
cokriged <- cokriged %>% as.data.table()
cokriged <- cokriged[Fe_log.var < var_cutoff] 

cokriged_val <- predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  0% done
100% done


#storing backtransformed kriging values
cokriged2 <- cokriged %>% select(contains(".pred")) %>% exp %>% cbind(cokriged[,c("X", "Y")], .)

Results Cokriging

Shown below are the interpolation results for the onmidirectional cokriging.

Al


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Al_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Al [%]")

a

Generally, few higher Al containing zones exist with contents of up to 12.5 %.

Co


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Co_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Co [%]")

a

Co is relatively scarce with only three smaller areas where is is concentrated above 0.1 %. Especially one area in the south-west stands out.

Fe


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Fe_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Fe [%]")

a

Iron is relatively abundant in the deposit. With large areas containing above 35% of iron in the east, west as well as small strips in the north.

Filler


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Filler_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Filler [%]")

a

While Filler is not a real chemical variable it is shown for reference purposes. It shows a opposite behaviour than the iron. This is logical since iron is the most abundand element and the covariates form a composite.

Mg


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Mg_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Mg [%]")

a

Mg in large areas is very scarce with the main area below 10%. However, small strips in the central north as well in the south show concentrations of up to 30%.

Ni


a <- ggplot(data = cokriged2, aes(x=X, y=Y, fill=cokriged2$Ni_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]")

a

Ni shows several areas where higher contents are measured. Especially in the West with up to 1.5% Ni. Also higher concentrations can be found in the centre and southeast.

6 Mixed Model Cokriging (VK2)

In this chapter, the mixed model cokriging is shown. Here, all 6 continuous chemical variables including the filler variable and the Lcode are combined in one model. According to Journel and Rossi 1989 [6], kriging with a trend is mainly of importance when extrapolation is sought. Since this is not the case here, No trend is used.

#populate gstat with num-vars
{gs_cs=NULL
for(j in vars_l){
  frm = as.formula(paste(j,"1", sep="~"))
  gs_cs = gstat(gs_cs, id=j, formula=frm, locations=~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5)
}
}
#add cat. levels
gs_cs <- gstat(id="FZ", formula = (Lcode=="FZ")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,g = gs_cs, nmin = 5) %>% 
  gstat(id="SA", formula = (Lcode=="SA")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>% 
  gstat(id="SM", formula = (Lcode=="SM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200, nmin = 5) %>% 
  gstat(id="UM", formula = (Lcode=="UM")~1, locations = ~X+Y, data=dt[subset=="training"], nmax = 20, omax=6, maxdist = 200,nmin = 5) 

6.1 Variography

The to incoporate anisotropy, the modelling procedure is as follows: 1st, the empirical variograms are evaluated. Then, using the varioApplet provided during the short course sessions, a preliminary fit with regards to anisotropy is obtained. Here, the anisotrpy values as follows where extracted:

  • Azimut of strongest continuity: 156°
  • Anisotropy ratio: 0,56

The following variogram shows the empricial variogram with the fitted anisotropic model for Fe_log. The azimut was varied in 30° steps. The tolerance equals 15° (default).


# empirical variogram
vg_cs <- variogram(gs_cs,width=30, cutoff=300, alpha=c(0:5)*30)

anis <- c(156,0.52) #anisotropy values from applet

#fit model
vg_cs_m = vgm(psill=0.9, range=300, nugget=1,anis= anis,   model="Sph")
gs_cs= gstat::fit.lmc(v=vg_cs, model=vg_cs_m, g=gs_cs, correct.diagonal = 1.0001)

#gs_cs$model <- model4

##draw selected plot with code from applet

    # number of directions
    ndirs = vg_cs$dir.hor %>% unique %>% length
    # color scale
    col_dirs = RColorBrewer::brewer.pal(ndirs,"Set1") 

    # plot
    vg0 = vg_cs[vg_cs$id=="Fe_log",]
    
    plot(gamma~dist, data=vg0, pch=21, cex=1.2,  
       bg=col_dirs[as.factor(dir.hor)],  
       xlim=range(0,vg0$dist), ylim=range(0, vg0$gamma))
    vg0[, c("dist", "gamma", "dir.hor")] %>% 
      split(as.factor(vg_cs$dir.hor)) %>% 
      sapply(lines, col="grey") 
## $`0`
## NULL
## 
## $`30`
## NULL
## 
## $`60`
## NULL
## 
## $`90`
## NULL
## 
## $`120`
## NULL
## 
## $`150`
## NULL
    
    
      myfun = function(x){
      lines(x[,c(1,2)], col=col_dirs[x$dir.hor/30+1], lty=2)
    }
    vg0[,c("dist", "gamma", "dir.hor")] %>% 
      split(as.factor(vg0$dir.hor)) %>% sapply(myfun)
## $`0`
## NULL
## 
## $`30`
## NULL
## 
## $`60`
## NULL
## 
## $`90`
## NULL
## 
## $`120`
## NULL
## 
## $`150`
## NULL
       
     for(i in 1:6){
      angle = pi/2 - ((i-1)*30)*pi/180
      direction = c(sin(angle), cos(angle) ,0)
      variodens = variogramLine(gs_cs$model$Fe_log, maxdist= 1.1*max(vg_cs$dist), dir=direction)
      lines(variodens, col=col_dirs[i], lwd=2)
     }
    legend(x = "bottomright", legend = vg_cs$dir.hor %>% unique %>% paste(., "°") , fill=col_dirs )

It can be seen, that the fit is not perfect but the general features of the variography are taken.


cok_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6 ) %>% as.data.table()
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  1% done
 13% done
 20% done
 23% done
 27% done
 29% done
 32% done
 36% done
 39% done
 42% done
 45% done
 49% done
 53% done
 58% done
 63% done
 70% done
 79% done
100% done

#calculation of variance cutoff
var_cutoff <- cok_cs$Fe_log.var %>% quantile(0.90, na.rm = T)

#omitting the respective values
cok_cs <- cok_cs[Fe_log.var < var_cutoff] 



#predictions for evaluation set
cok_cs_val <- predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6,)
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
  7% done
100% done
cok_cs_val$Lcode <- cok_cs_val %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>% 
   factor(x = .,levels = c(1,2,3,4),labels = Lcodes)

#calculating most propable Lcode from results
Lcodes <- c("FZ", "SA", "SM", "UM")
cok_cs$Lcode <- cok_cs %>% select(paste0(Lcodes, ".pred")) %>% max.col() %>% 
   factor(x = .,levels = c(1,2,3,4),labels = Lcodes)


#kriging results of backtransformed values
cok_cs2 <- cok_cs %>% select(paste0(fkt_new,".pred")) %>% exp %>% cbind(cok_cs[,c("X", "Y")], .)
a <- ggplot(data = cok_cs2, aes(x=X, y=Y, fill=cok_cs2$Ni_log.pred))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title = "Backtransformed Ni")


b <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Lcode))+
  geom_raster()+
  coord_fixed()+
  geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y,  fill=Lcode), size=3, shape=21)+
  labs(fill="Lcode", title = "Lcode")

c <- ggplot(data = cok_cs, aes(x=X, y=Y, fill=cok_cs$Ni_log.var))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Var", title = "Variance Ni_log")


grid.arrange(a,b,c, nrow=2)

The second plot reveals a problematic situation with Lcode. Since the sampling sites for SA are relatively sparsly distributed, the variogram could not not capture any spacial variance changes even at the minimum bin. This can be seen in the variogram for SA. As a result even in cells that are very close to sample sites that sampled SA, SA is not the “most propable” Lcode. This results in rather odd results for the validation of LCode as shown in the following:




CM_m <- cvms::confusion_matrix(targets =dt[subset=="validation", Lcode], predictions = cok_cs_val$Lcode )

cvms::plot_confusion_matrix(CM_m$`Confusion Matrix`[[1]],
                      add_sums = TRUE,
                      add_normalized = F,
                      diag_percentages_only = T,add_row_percentages = F, sums_settings = )

It can be seen that no validation data points where classified as UM and SA although 27 resp. 32 members exist in these classes. The sensitivity values lie at 65.7% for SM and 78.2% for FZ. Following from this, the overall accuracy lies at 60.2%. This means, that the accuracy for this model is marginally better - by 1.4%. This is remarcable given the total misclassification of the minor groups UM and SA and can be attributed to their relatively low contribution to the total sample amount.

7 Logratio-compositional Cokriging - VK3

With compositional modelling, the covariates are are treated as parts of a whole that sums up to one. Meaning, when one variable decreases, one or several others have to increase to maintain the condition of the whole one. As such, they form a multidimensional simplex. To illustrate this, the ternery diagram for the abundant variable Fe, Mg and Al, as well as for the elements Co, Ni and Fe is shown below. The diagram scales were centered. As such, the sparse variables can visually be compared to the abundand Fe.



dtcomp0 <-  dt %>% select(c("Fe", "Mg", "Al")) %>% acomp
plot.acomp(dtcomp0,center = T)


dtcomp0 <-  dt %>% select(c("Co", "Ni", "Fe")) %>% acomp
plot.acomp(dtcomp0,center = T)

It shows for example that very high values of MG correspond to lower to medium Al values. Also rising MG values up to a certain point go in line with a relatively stable ratio of Fe to Al.

7.0.1 Swath Plots for compositions

Subsequently, swath plots for the compositions are used to identify whether the necessity for a spatial trend should be considered.

X-direction

dtt <- dt[subset=="training",]

#dtt <- dt

dtcomp = dtt %>% select(Al:Ni) %>% acomp # calculate compositions
 
X = dtt %>% select(X, Y) #coordinates

swath(dtcomp, loc=X$X)  # quite flat --> consant mean

In X-diection no larger trends are seen.

Y-direction


swath(dtcomp, loc=X$Y)

In Y-direction, local trends can be found. This is especially true for Mg-compositions at low Y-values, but for Co and Al to some extend. However, this does not apply for all variables. As such a constant mean is assumed.

7.0.2 Compositional variography

Omnidirectional Variograms

# define the gmSpatialModel object
gmv=NULL
gmv = make.gmCompositionalGaussianSpatialModel(
  data = dtcomp, # use 
  coords = X,   # spatial coordinates
  V = "alr",    # eventually, use alr logratios if needed
  formula = ~1+Y, # drift in Y-direction
  nmax = 20,
  maxdist = 300 # cokriging neighbourgood has max. 20 samples
)


# compute logratio variograms
lrvg = gmGeostats::logratioVariogram(gmv,maxdist=300, nbins=8)


# define a compositional linear model
lrmd = CompLinModCoReg(~nugget()+sph(200), comp=dtcomp)

# fit

lrmdf = gmGeostats::fit_lmc(v = lrvg, model=lrmd)
## iteration = 0
## Step:
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Parameter:
##  [1] 1.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00
##  [7] 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00
## [13] 2.77556e-17 0.00000e+00 1.00000e+00 2.00000e+02 1.00000e+00 0.00000e+00
## [19] 1.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00
## [25] 0.00000e+00 1.00000e+00 0.00000e+00 0.00000e+00 2.77556e-17 0.00000e+00
## [31] 1.00000e+00
## Function Value
## [1] 10893
## Gradient:
##  [1]  -108.156693   297.619525  3049.300014  -368.288145  -748.182243
##  [6]  2766.682668  -779.763677   -15.053831  -921.636873 -6057.064429
## [11] -1334.451479   -78.861751  -564.866807  1160.271098  2968.953504
## [16]    -0.936274   -77.547524   216.663911  2492.878568  -331.235718
## [21]  -602.925544  2234.538662  -647.137816    -9.206993  -751.703716
## [26] -4987.016242 -1109.433486   -55.840799  -460.259636   958.687200
## [31]  2427.462565
## 
## iteration = 100
## Parameter:
##  [1]   1.1721934  -0.2596609   0.4860233  -0.1436749   0.0773978   0.6690391
##  [7]   0.6742597   0.1745889   1.6622803   0.3064601   0.3795978   0.1200620
## [13]  -0.0366835   0.3468574   0.2425170 199.6695968   1.0909144  -0.1408501
## [19]   0.3306164   0.2670870   0.2278202   0.7385879   1.4286961   0.0324616
## [25]   1.3912912   0.9490167   0.4769990   0.1523720  -0.1642027  -0.0517475
## [31]  -0.1553782
## Function Value
## [1] 49.3651
## Gradient:
##  [1] -0.0353743 -0.2617160 -0.1434245  0.3056782  0.4594159 -0.4388945
##  [7] -0.1237833  0.3831176  0.9725194  0.3064568 -0.3401379  0.6288761
## [13] -0.4948074 -0.4269850 -0.5709800  0.8095726 -0.3825595 -0.0790436
## [19]  0.1121595 -0.1432528 -0.0173355  0.1245203  0.4483971  0.1058077
## [25] -0.6995001 -0.3552509  0.1435383 -0.1836260  0.5693022  1.4424501
## [31]  0.1776168
## 
## Iterationslimit überschritten. Algorithmus fehlgeschlagen.

# plot
plot(lrvg, lrvg=vgram2lrvgram(lrmdf)) 

Due to an unknown error, the anisotropy could not be fitted Hence a genereral onmidirectional model is utilized. The code for the fitting of an anisotropic model can be extended below.


# lrvga = gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=10,
#                           azimuth=c(0,45,90,115), azimuth.tol=60)
# 
# colors <- c("blue", "red", "yellow", "purple")
# 
# plot(lrvga, col= colors)

# class(lrvga)
# 
# 
# lrmda = LMCAnisCompo(Z=dtcomp,azimuths=c(0,45,90,115), models=c("nugget", "sph"))
# 
# lrmdfa = gmGeostats::fit_lmc(v = lrvga,g=gmv, model=lrmda)
# 
# 
# plot(lrvga, lrvg=vgram2lrvgram(lrmdfa))

Directional variograms

Shown below is a plot of the directional behaviour of the variograms.

  # 
gmGeostats::logratioVariogram(gmv, maxdist=300, nbins=8,
                  azimuth=(0:11)*30, azimuth.tol=60) %>% 
  image

It can be seen that in E-W direction, a generally lower variation for most logratio-pairs is oberved.

Subsequently the kriging interpolations are shown with the backtransformed logratio models.

7.1 Results Logratio Cokriging

In the following, the kriging results for the backtransformed logratio compositions for three elements: Al, Co and Ni.

gmv = make.gmCompositionalGaussianSpatialModel(
  data = dtcomp, # use vicaria-comp
  coords = X,   # spatial coordinates
  V = "alr",    # eventually, use alr logratios if needed
 # formula = ~1+Y, # !!!creates bug at interpolation
  formula = ~1, # consider drift in Easting direction
  nmax = 30, # cokriging neighbourgood has max. 20 samples
  maxdist = 400,
 omax = 6,
  model = as.LMCAnisCompo(lrmdf)  # necessary conversion
)

#logratio cokriging
cokrig_alr = predict(gmv, newdata=xy_grid, debug.level=-1)
## starting cokriging 
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
 27% done
 41% done
 59% done
100% done

# cutoff of maximum variance
var_cutoff <- cokrig_alr$alr1.var %>% quantile(0.9, na.rm = T)
cokrig_alr <- cokrig_alr %>% as.data.table()
cokrig_alr <- cokrig_alr[alr1.var < var_cutoff]

#backtransforming to original variables
cokrig_compo = gsi.gstatCokriging2compo(
  cokrig_alr, V="alr", orignames = colnames(dtcomp))

#transform into percent
cokrig_compo <- cokrig_compo %>% as.data.frame %>% multiply_by(100)

#model performance for comparison

cok_alr_val <- predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6)
## starting cokriging 
## Linear Model of Coregionalization found. Good.
## [using ordinary cokriging]
## 
100% done
cok_alr_val = cbind(cok_alr_val[,1:2],
                    gsi.gstatCokriging2compo(
                      cok_alr_val, V="alr", orignames = colnames(dtcomp)) %>% 
                      as.data.frame %>% multiply_by(100)
)

Al

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Al))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Al [%]")

Co

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Co))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Co [%]")

Ni

 
ggplot(data =`cokrig_alr`, aes(x=cokrig_alr$X, y=cokrig_alr$Y, fill=cokrig_compo$Ni))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]")

8 Simulation VS1 - VS3

For the

n=10

# simulation for cokriging with log-values
set.seed(123)
cosim = predict(gs_l, newdata=xy_grid, debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  1% done
  5% done
  9% done
 12% done
 16% done
 19% done
 22% done
 26% done
 29% done
 32% done
 36% done
 39% done
 42% done
 46% done
 49% done
 52% done
 55% done
 59% done
 62% done
 65% done
 68% done
 71% done
 74% done
 77% done
 80% done
 83% done
 86% done
 89% done
 92% done
 95% done
 98% done
100% done


# simulation for logratio compositions

cos_alr = predict(gmv, newdata=xy_grid, debug.level=-1,nmax=20, omax=6, nsim=n) 
## starting cokriging 
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  4% done
 10% done
 16% done
 21% done
 26% done
 31% done
 37% done
 42% done
 48% done
 53% done
 58% done
 63% done
 68% done
 73% done
 78% done
 83% done
 88% done
 93% done
 98% done
100% done
cos_alr1 <- cos_alr
cos_alr <- cos_alr1
# outcome analysis
class(cos_alr)  
## [1] "data.frame"
dim(cos_alr) 
## [1] 15759    52
colnames(cos_alr)
##  [1] "X"          "Y"          "alr1.sim1"  "alr1.sim2"  "alr1.sim3" 
##  [6] "alr1.sim4"  "alr1.sim5"  "alr1.sim6"  "alr1.sim7"  "alr1.sim8" 
## [11] "alr1.sim9"  "alr1.sim10" "alr2.sim1"  "alr2.sim2"  "alr2.sim3" 
## [16] "alr2.sim4"  "alr2.sim5"  "alr2.sim6"  "alr2.sim7"  "alr2.sim8" 
## [21] "alr2.sim9"  "alr2.sim10" "alr3.sim1"  "alr3.sim2"  "alr3.sim3" 
## [26] "alr3.sim4"  "alr3.sim5"  "alr3.sim6"  "alr3.sim7"  "alr3.sim8" 
## [31] "alr3.sim9"  "alr3.sim10" "alr4.sim1"  "alr4.sim2"  "alr4.sim3" 
## [36] "alr4.sim4"  "alr4.sim5"  "alr4.sim6"  "alr4.sim7"  "alr4.sim8" 
## [41] "alr4.sim9"  "alr4.sim10" "alr5.sim1"  "alr5.sim2"  "alr5.sim3" 
## [46] "alr5.sim4"  "alr5.sim5"  "alr5.sim6"  "alr5.sim7"  "alr5.sim8" 
## [51] "alr5.sim9"  "alr5.sim10"

#debugged version to backtransform results in one container
cos_alr <- cos_alr1
cos_alr <- cos_alr[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp <- list()
for (i in 1:n){
cos_comp[[i]] <- cos_alr %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}



# mixed model cosimulation

cos_cs = predict(gs_cs, newdata=xy_grid, debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
  0% done
  1% done
  2% done
  3% done
  4% done
  5% done
  6% done
  7% done
  8% done
  9% done
 10% done
 11% done
 12% done
 13% done
 14% done
 15% done
 16% done
 17% done
 18% done
 19% done
 20% done
 21% done
 22% done
 23% done
 24% done
 25% done
 26% done
 27% done
 28% done
 29% done
 30% done
 31% done
 32% done
 33% done
 34% done
 35% done
 36% done
 37% done
 38% done
 39% done
 40% done
 41% done
 42% done
 43% done
 44% done
 45% done
 46% done
 47% done
 48% done
 49% done
 50% done
 51% done
 52% done
 53% done
 54% done
 55% done
 56% done
 57% done
 58% done
 59% done
 60% done
 61% done
 62% done
 63% done
 64% done
 65% done
 66% done
 67% done
 68% done
 69% done
 70% done
 71% done
 72% done
 73% done
 74% done
 75% done
 76% done
 77% done
 78% done
 79% done
 80% done
 81% done
 82% done
 83% done
 84% done
 85% done
 86% done
 87% done
 88% done
 89% done
 90% done
 91% done
 92% done
 93% done
 94% done
 95% done
 96% done
 97% done
 98% done
 99% done
100% done

The



b <- ggplot(data = cosim, aes(x=X, y=Y, fill=cosim$Ni_log.sim1 %>% exp()))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Cosimulation w. log-values")

a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=cos_cs$Ni_log.sim1%>% exp()))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Mixed model cosimulation w. log-values")


c <- ggplot(data = cos_comp[[3]] %>% as.data.frame(), aes(x=cosim$X, y=cosim$Y, fill=Ni*100))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Compositional logratio cosimulation")

grid.arrange(b, a,c, nrow=2)

8.1 Mean Values and Deviation

# mean values of 

#cosimulation with logvalues

for (i in fkt_new) {
    cosim[,paste0(i,".mean")] <-  cosim %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
    cosim[,paste0(i,".vc")] <-  cosim %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
    
    }

# mixed model
for (i in fkt_new) {
    cos_cs[,paste0(i,".mean")] <-  cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs[,paste0(i,".vc")] <-  cos_cs %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}

# logratio cosimulation

cos_comp1 <- array(as.numeric(unlist(cos_comp)), dim=c(nrow(cos_comp[[1]]),
                                               ncol(cos_comp[[1]]),
                                                    n))

cos_comp_sum <- data.table()
o <- 0
for (i in colnames.num) {
  o <- o+1
  cos_comp_sum[,paste0(i,".mean")] <- cos_comp1[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100
  cos_comp_sum[,paste0(i,".vc")] <- cos_comp1[,o,]  %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}

cos_comp_sum <- cbind(cos_comp_sum, X=cosim$X, Y=cosim$Y)

The plots for the simulated mean values are again shown on the example of Ni

Mean values


b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.mean)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Cosimulation w. log-values")


a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.mean)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Mixed model cosimulation w. log-values")




c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.mean))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="Ni [%]", title ="Compositional logratio cosimulation")


grid.arrange(b,a,c, nrow=2)

While V1 and V2 show very similar behaviour, V3 shows little smaller and less “peaky” values.

Variation coefficient


b <- ggplot(data = cosim, aes(x=X, y=Y, fill=(cosim$Ni_log.vc)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Cosimulation w. log-values")


a <- ggplot(data = cos_cs, aes(x=X, y=Y, fill=(cos_cs$Ni_log.vc)))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Mixed model cosimulation w. log-values")




c <- ggplot(data = cos_comp_sum, aes(x=X, y=Y, fill=Ni.vc))+
  geom_raster()+
  coord_fixed()+
  scale_fill_gradientn(colors = c("blue2","purple", "red2"))+
 #geom_point(data=dt[subset=="training",], inherit.aes = F, aes(x=X, y=Y), color="black", size=1)+
  labs(fill="VC", title ="Compositional logratio cosimulation")


grid.arrange(b,a,c, nrow=2)

It can be seen that large differences appear between


n=10

# simulation for cokriging with log-values V1
set.seed(123)
cosim_val = predict(gs_l, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1, nmax=20, omax = 6, nsim = n)
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
 92% done
100% done



# mixed model cosimulation

cos_cs_val = predict(gs_cs, newdata=dt[subset=="validation", c("X", "Y")], debug.level = -1, nmax=20, omax = 6, nsim=n) %>% as.data.table()
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
 37% done
 76% done
100% done


# simulation for logratio compositions

cos_alr_val = predict(gmv, newdata=dt[subset=="validation", c("X", "Y")], debug.level=-1,nmax=20, omax=6, nsim=n) 
## starting cokriging 
## drawing 10 multivariate GLS realisations of beta...
## Linear Model of Coregionalization found. Good.
## [using conditional Gaussian cosimulation]
## 
 48% done
100% done

# outcome analysis
class(cos_alr)  
## [1] "data.frame"
dim(cos_alr) 
## [1] 15759    50
colnames(cos_alr)
##  [1] "alr1.sim1"  "alr1.sim2"  "alr1.sim3"  "alr1.sim4"  "alr1.sim5" 
##  [6] "alr1.sim6"  "alr1.sim7"  "alr1.sim8"  "alr1.sim9"  "alr1.sim10"
## [11] "alr2.sim1"  "alr2.sim2"  "alr2.sim3"  "alr2.sim4"  "alr2.sim5" 
## [16] "alr2.sim6"  "alr2.sim7"  "alr2.sim8"  "alr2.sim9"  "alr2.sim10"
## [21] "alr3.sim1"  "alr3.sim2"  "alr3.sim3"  "alr3.sim4"  "alr3.sim5" 
## [26] "alr3.sim6"  "alr3.sim7"  "alr3.sim8"  "alr3.sim9"  "alr3.sim10"
## [31] "alr4.sim1"  "alr4.sim2"  "alr4.sim3"  "alr4.sim4"  "alr4.sim5" 
## [36] "alr4.sim6"  "alr4.sim7"  "alr4.sim8"  "alr4.sim9"  "alr4.sim10"
## [41] "alr5.sim1"  "alr5.sim2"  "alr5.sim3"  "alr5.sim4"  "alr5.sim5" 
## [46] "alr5.sim6"  "alr5.sim7"  "alr5.sim8"  "alr5.sim9"  "alr5.sim10"

#debugged version to backtransform results in one container

cos_alr_val <- cos_alr_val[,-c(1,2)]
alrs <- c("alr1", "alr2", "alr3", "alr4","alr5")
i=1
cos_comp_val <- list()
for (i in 1:n){
cos_comp_val[[i]] <- cos_alr_val %>% select(paste0(alrs,".sim", i)) %>%alrInv(z = .,orig = dtcomp)
}





# mean values for validation

#cosimulation with logvalues

for (i in fkt_new) {
    cosim_val[,paste0(i,".mean")] <-  cosim_val %>% select(contains(i)) %>% exp() %>% apply(.,1, mean) %>% as.vector()
    cosim_val[,paste0(i,".vc")] <-  cosim_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
    
    }

# mixed model
for (i in fkt_new) {
    cos_cs_val[,paste0(i,".mean")] <-  cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs_val[,paste0(i,".vc")] <-  cos_cs_val %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
}

# logratio cosimulation

cos_comp1_val <- array(as.numeric(unlist(cos_comp_val)), dim=c(nrow(cos_comp_val[[1]]),
                                               ncol(cos_comp_val[[1]]),
                                                    n))

cos_comp_sum_val <- data.table()
o <- 0
for (i in colnames.num) {
  o <- o+1
  cos_comp_sum_val[,paste0(i,".mean")] <- cos_comp1_val[,o,] %>% as.data.frame() %>% apply(.,1, mean) %>% as.vector() *100
  cos_comp_sum_val[,paste0(i,".vc")] <- cos_comp1_val[,o,]  %>% as.data.frame() %>% apply(.,1, function (x)(sd(x*100)/mean(x*100))) %>% as.vector()
}

cos_comp_sum_val <- cbind(cos_comp_sum_val,
                          X=dt[subset=="validation", c("X")],
                          Y=dt[subset=="validation", c("Y")])

9 Model performance - Cross validation

In order to evaluate the performance of the three different models, the validation part of the data set is used for cross validation.

The mean absolute error of the prediction in % is calculated.

#dataframe consisting of predicted and actual validation values

#for VK1 cokriging with log values 
dt.kv_l <- data.table(cokriged_val %>% select(contains(".pred")),
                      dt[subset=="validation"] %>% select(fkt_new))



sumry_l <- dt.kv_l %>% summarise(.,
                                 Name = "VK1",
                Al_me = mean(abs(exp(Al_log)-exp(Al_log.pred))),
                Co_me = mean(abs(exp(Co_log)-exp(Co_log.pred))),
                Fe_me = mean(abs(exp(Fe_log)-exp(Fe_log.pred))),
                Filler_me = mean(abs(exp(Filler_log)-exp(Filler_log.pred))),
                Mg_me = mean(abs(exp(Mg_log)-exp(Mg_log.pred))),
                Ni_me = mean(abs(exp(Ni_log)-exp(Ni_log.pred))))

#for VK2 mixed model cokriging
dt.kv_cs <- data.table(cok_cs_val %>% select(paste0(fkt_new,".pred")),
                      dt[subset=="validation"] %>% select(fkt_new))



sumry_cs <- dt.kv_cs %>% summarise(.,
                                   Name = "VK2",
                Al_me = mean(abs(exp(Al_log)-exp(Al_log.pred))),
                Co_me = mean(abs(exp(Co_log)-exp(Co_log.pred))),
                Fe_me = mean(abs(exp(Fe_log)-exp(Fe_log.pred))),
                Filler_me = mean(abs(exp(Filler_log)-exp(Filler_log.pred))),
                Mg_me = mean(abs(exp(Mg_log)-exp(Mg_log.pred))),
                Ni_me = mean(abs(exp(Ni_log)-exp(Ni_log.pred))))


# for VK3 loratio cokriging

dt.alr <- data.table(cok_alr_val[,-c(1,2)] ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_alr <- dt.alr %>% summarise(.,
                                  Name = "VK3",
                Al_me = mean(abs((Al_log)-(Al))),
                Co_me = mean(abs((Co_log)-(Co))),
                Fe_me = mean(abs((Fe_log)-(Fe))),
                Filler_me = mean(abs((Filler_log)-(Filler))),
                Mg_me = mean(abs((Mg_log)-(Mg))),
                Ni_me = mean(abs((Ni_log)-(Ni))))

#######for cosimulations######
#for VS1 cosimulation with log values 
dt.cv_l <- data.table(cosim_val %>% select(contains(".mean")),
                      dt[subset=="validation"] %>% select(fkt_new))



sumry_cvl <- dt.cv_l %>% summarise(.,
                                 Name = "VC1",
                Al_me = mean(abs(exp(Al_log)-(Al_log.mean))),
                Co_me = mean(abs(exp(Co_log)-(Co_log.mean))),
                Fe_me = mean(abs(exp(Fe_log)-(Fe_log.mean))),
                Filler_me = mean(abs(exp(Filler_log)-(Filler_log.mean))),
                Mg_me = mean(abs(exp(Mg_log)-(Mg_log.mean))),
                Ni_me = mean(abs(exp(Ni_log)-(Ni_log.mean))))

#for VS2 mixed model cosimulation
dt.cv_cs <- data.table(cos_cs_val %>% select(paste0(fkt_new,".mean")),
                      dt[subset=="validation"] %>% select(fkt_new))



sumry_cv_cs <- dt.cv_cs %>% summarise(.,
                                   Name = "VC2",
                Al_me = mean(abs(exp(Al_log)-(Al_log.mean))),
                Co_me = mean(abs(exp(Co_log)-(Co_log.mean))),
                Fe_me = mean(abs(exp(Fe_log)-(Fe_log.mean))),
                Filler_me = mean(abs(exp(Filler_log)-(Filler_log.mean))),
                Mg_me = mean(abs(exp(Mg_log)-(Mg_log.mean))),
                Ni_me = mean(abs(exp(Ni_log)-(Ni_log.mean))))


# for VS3 logratio cosimulation

dt.cos_alr <- data.table(cos_comp_sum_val ,
                     dt[subset=="validation"] %>% select(fkt_new) %>% exp)


sumry_cos_alr <- dt.cos_alr %>% summarise(.,
                                  Name = "VC3",
                Al_me = mean(abs((Al_log)-(Al.mean))),
                Co_me = mean(abs((Co_log)-(Co.mean))),
                Fe_me = mean(abs((Fe_log)-(Fe.mean))),
                Filler_me = mean(abs((Filler_log)-(Filler.mean))),
                Mg_me = mean(abs((Mg_log)-(Mg.mean))),
                Ni_me = mean(abs((Ni_log)-(Ni.mean))))



####Summary####
sum_val <- rbind(sumry_l, sumry_cs, sumry_alr,
                 sumry_cvl,sumry_cv_cs,sumry_cos_alr) %>% kable(table.attr = "style = \"color: black;\"") %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
sum_val
Name Al_me Co_me Fe_me Filler_me Mg_me Ni_me
VK1 2.202556 0.0447709 8.856579 7.082687 3.946235 0.3153578
VK2 2.270978 0.0443517 9.107742 7.221644 4.011018 0.3164846
VK3 2.202224 0.0445384 8.807351 8.020995 3.882599 0.3085122
VC1 2.644393 0.0549201 9.684120 7.184744 4.175863 0.3193899
VC2 2.479498 0.0544557 8.930254 7.071952 4.039620 0.3149048
VC3 6.447633 0.0540879 9.678661 8.847417 3.863677 0.3230897






# temp <- list()

# for (i in fkt_new) {
# temp[[i]] <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, ".pred")), y=get(i)))+
#   geom_point(size=0.5)+
#   geom_abline(intercept = 0)+
#   labs(title = i, x="Pred. V.", y= "Real V.")
#
#
# }
#
# do.call("grid.arrange", c(temp, ncol=3))

It shows that the variants generally show a similar error.

To obtain additional detailed information, the predicted values are plotted against the real values to also obtain a qualitative insight into the model quality.


i <- "Ni"

a <- ggplot(data=dt.kv_l, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK1", x="Pred. V.", y= "Real V.")



b <- ggplot(data=dt.kv_cs, aes(x = get(paste0(i, "_log.pred"))%>% exp(), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK2", x="Pred. V.", y= "Real V.")


c <- ggplot(data=dt.alr, aes(x = get(paste0(i)), y=get(paste0(i, "_log"))))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VK3", x="Pred. V.", y= "Real V.")


d <- ggplot(data=dt.cv_l, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log"))%>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS1", x="Pred. V.", y= "Real V.")


e <- ggplot(data=dt.cv_cs, aes(x = get(paste0(i, "_log.mean")), y=get(paste0(i, "_log")) %>% exp()))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS2", x="Pred. V.", y= "Real V.")



f <- ggplot(data=dt.cos_alr, aes(x = get(paste0(i, ".mean")), y= get(paste0(i, "_log"))))+
  geom_point(size=0.5)+
  geom_abline(intercept = 0)+
  coord_fixed()+
  labs(title = "VS3", x="Pred. V.", y= "Real V.")



 grid.arrange(a,b,c,d,e,f, nrow=2)

It can be seen that …

10 Final Resource Estimation

VC2 is used for the final estimation as it showed the best overall results.

cos_cs_f_pred <- readRDS(file = "prediction_final.RDS")

# calculate mean values for chemicals
for (i in fkt_new) {
    cos_cs_f_pred[,paste0(i,".mean")] <-  cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, mean) %>% as.vector() 
    cos_cs_f_pred[,paste0(i,".vc")] <-  cos_cs_f_pred %>% select(contains(i))%>% exp() %>% apply(.,1, function (x)(sd(x)/mean(x))) %>% as.vector()
   
}
#calculate sum LCode


for (i in Lcodes) {
cos_cs_f_pred[,(paste0(i,".sum"))] <- 
  cos_cs_f_pred %>% 
  select(contains(paste0(i, ".sim"))) %>% apply(.,1, median) %>% as.vector()
}
cos_cs_f_pred$Lcode.pred <- cos_cs_f_pred %>% 
  select(contains(".sum")) %>% max.col() %>% factor(x = .,levels = c(1,2,3,4),labels = Lcodes) 

Following, Lcode and Ni with an overlay of Ni and the sample site with their Lcode is shown. In the second graph, zoom can be used, also the sampling points can be switched off by clicking on their symbols in the legend.

Lcode

a <- ggplot(data = cos_cs_f_pred, aes(x=X, y=Y, fill=(cos_cs_f_pred$Lcode.pred %>% as.character())))+
  geom_raster()+
  coord_fixed()+
geom_point(data=dt, inherit.aes = F, aes(x=X, y=Y,  fill=Lcode %>% as.character()), size=3, shape=21, alpha=0.5)+
  labs(fill="Lcode", title = "Lcode")

a

Interactive Plot Ni and Sample sites LCode

plotly::plot_ly(title="Ni [%]") %>% 
  add_rasterly_heatmap(cos_cs_f_pred,
                       x= cos_cs_f_pred$X,
                       y= cos_cs_f_pred$Y,
                       z=cos_cs_f_pred$Ni_log.mean,
                       colorscale='Viridis') %>% 
 plotly::add_markers(data = dt, x=~X, y= ~Y, color= dt$Lcode,size=1,
                     marker=list(sizeref=10)) %>%
  plotly::layout(title ="Ni [%] and Lcode sampling points",autosize = F )

It can be seen that the model fits very small patches to the sparse variables SA and UM. Also it tends to underfit the peak behaviour of certain points. This however is due to the fact that the mean values are plotted.

10.1 Comulative distribution

cos_cs_f_pred %<>% as.data.table()
dt.m <- melt.data.table(cos_cs_f_pred %>% select(contains(".sim")) %>% 
                          select(contains(fkt_new))) %>% as.data.table()
dt.m$value <- dt.m$value %>% exp()
Mg_max <- dt.m[variable %like% ("Mg")]$value %>% max() %>% round()
Mg_q999 <- dt.m[variable %like% ("Mg")]$value %>% quantile(.,0.999) %>% round()

In the following, the comulative grade distribution for the individual elements is shown. The values are truncated at the 0.999-Quntile. This is because with the simulation method chosen in rare occasions very high values are given for a grid node. For example, the maximum for Mg lies at 215 %. This is clearly not possible but can be attributed to the fact, that a noncompositional method is used here. The 0.999-Quantile however then lies at 73 %. The vertical lines in the plots indicate the 0.999-Quantile.



i="Al_log"
temp <- list()
o=0
for (i in fkt_new) {
o=o+1
  temp[[i]] <- 
  ggplot(dt.m[variable %like% (i)], aes(x = value, group = variable)) +
stat_ecdf(pad=F)+
    geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% max())+
    geom_vline(xintercept = dt.m[variable %like% (i)]$value %>% quantile(.,0.999))+
  labs(title=fkt[[o]], x = "content [%]")+
    xlim(0,dt.m[variable %like% (i)]$value %>% quantile(.,0.999))
}

do.call("grid.arrange", c(temp, ncol=3))

10.2 Grade-tonnage Curves

To calculate the Grade-Cutoff-Plots, a custom function as seen in the code is used.



# function to calculate grade-cutoff-table
GraTon.Tab <- function(use, max.quantile=0.999, nsteps = 100) {
  Q <- use %>% quantile(.,max.quantile) # define max use quantile?
  steps <- seq(from=0, to=Q, length.out=nsteps) # number of steps
  gt <- data.table(cutoff=steps) #basic table
  gt$n <- lapply(steps, function(x)(length(which(use>x )))) %>% unlist() # Number of cells above cutoff
  gt$prop <- gt$n/max(gt$n) #proportion of n
  gt$mean <- lapply(steps, function(x)(mean(use[use>x] ))) %>% unlist() # mean above cutoff
  gt
}
#create individual cutoff tables for all iterations and merge them into one table

temp <- list()
CO_all <- list()
coeff <- NULL
i="Al"
for (i in fkt) {
  #apply to extract all iterations of given element and calculate single cutoff-table
  CO_all[[i]] <- lapply(dt.m[variable %like% (i)]$variable %>% unique(),
       function(x) (GraTon.Tab(use = dt.m[variable == (x)]$value,max.quantile = 0.99))) %>% 
  rbindlist(l = .,idcol = "Nr") #coerce individual tables into one
   
  
  #coefficient for secondary axis transformation
 coeff[paste(i)] <- max(CO_all[[i]]$mean) 
 CO_all[[i]]$mean <- CO_all[[i]]$mean/coeff[paste(i)]
 #formula for secondary axis transformation
frm = as.formula(paste0("~.*",coeff[paste(i)]))

  temp[[i]] <- ggplot(CO_all[[i]], aes(x=cutoff, group = Nr)) +
    geom_line(aes( y= prop), color="blue3")+
    geom_line(aes(y= mean), color = "Orange3")+
  scale_y_continuous(name= "Proportion above cut-off",
                     sec.axis = sec_axis(trans=frm, name="mean grade [%]"))+
    labs(title = i)
  
  }
do.call("grid.arrange", c(temp, ncol=3))

11 Summary

In this report, three Methods have been applied to investigate their relative suitability to analyze a resource estimation problem in a part of a case study. The modelling was preceded by a explorative data analysis to decide on the appropriate methods and data transformations. The results have shown that the error in between the models is …. As such, the best performin model was …, however, the … .

The model shows a certain anisotropy in direction WNW-SSE. This anisotrpy was only modelled in the .. Approach.

12 Sources

[1] - Talebi, H., Mueller, U., Tolosana-Delgado, R. et al. Geostatistical Simulation of Geochemical Compositions in the Presence of Multiple Geological Units: Application to Mineral Resource Evaluation. Math Geosci 51, 129–153 (2019). https://doi.org/10.1007/s11004-018-9763-9

[2] - Makowski, D.; Ben-Shachar, M.; Patil, I.; Lüdecke, D. (2020): Methods and Algorithms for Correlation Analysis in R. In: JOSS 5 (51), S. 2306. DOI: 10.21105/joss.02306.

[3] - Tolosana-Delgado, R. and Menzel P., Block Course: Practical Geostatistics, TU Bergakademie Freiberg, 2021

[4] - Friendly, M.; Working with categorical data with R and the vcd and vcdExtra packages, York University, Toronto, 2021. https://cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial.pdf

[5] - Olsen, L.R.; The available metrics in cvms https://cran.r-project.org/web/packages/cvms/vignettes/available_metrics.html#multinomial-metrics

[6] - Journel, A.G., Rossi, M.E. When do we need a trend model in kriging?. Math Geol 21, 715–739 (1989). https://doi.org/10.1007/BF00893318